Subgroup Discovery with CN2-SD
نویسندگان
چکیده
This paper investigates how to adapt standard classification rule learning approaches to subgroup discovery. The goal of subgroup discovery is to find rules describing subsets of the population that are sufficiently large and statistically unusual. The paper presents a subgroup discovery algorithm, CN2-SD, developed by modifying parts of the CN2 classification rule learner: its covering algorithm, search heuristic, probabilistic classification of instances, and evaluation measures. Experimental evaluation of CN2-SD on 23 UCI data sets shows substantial reduction of the number of induced rules, increased rule coverage and rule significance, as well as slight improvements in terms of the area under ROC curve, when compared with the CN2 algorithm. Application of CN2-SD to a large traffic accident data set confirms these findings.
منابع مشابه
Analysis of Example Weighting in Subgroup Discovery by Comparison of Three Algorithms on a Real-life Data Set
This paper investigates the implications of example weighting in subgroup discovery by comparing three state-of-the-art subgroup discovery algorithms, APRIORI-SD, CN2-SD, and SubgroupMiner on a real-life data set. While both APRIORI-SD and CN2-SD use example weighting in the process of subgroup discovery, SubgroupMiner does not. Moreover, APRIORI-SD uses example weighting in the post-processing...
متن کاملAPRIORI-SD: Adapting Association Rule Learning to Subgroup Discovery
& This paper presents a subgroup discovery algorithm APRIORI-SD, developed by adapting association rule learning to subgroup discovery. The paper contributes to subgroup discovery, to a better understanding of the weighted covering algorithm, and the properties of the weighted relative accuracy heuristic by analyzing their performance in the ROC space. An experimental comparison with rule learn...
متن کاملUsing Subgroup Discovery to Analyze the UK Traffic Data
Rule learning is typically used in solving classification and prediction tasks. However, learning of classification rules can be adapted also to subgroup discovery. Such an adaptation has already been done for the CN2 rule learning algorithm. In previous work this new algorithm, called CN2-SD, has been described in detail and applied to the well known UCI data sets. This paper summarizes the mo...
متن کاملMaking CN 2 - SD subgroup discovery algorithm scalable to large size data sets using instance selection q
The subgroup discovery, domain of application of CN2-SD, is defined as: ‘‘given a population of individuals and a property of those individuals, we are interested in finding a population of subgroups as large as possible and have the most unusual statistical characteristic with respect to the property of interest’’. The subgroup discovery algorithm CN2-SD, based on a separate and conquer strate...
متن کاملExperimental Comparison of Three Subgroup Discovery Algorithms: Analysing Brain Ischaemia Data
This paper presents experimental results of subgroup discovery algorithms SD, CN2-SD and Apriori-SD implemented in the Orange data mining software. The experimental comparison shows that algorithms perform quite differently on data discretized in different ways. From the experiments, performed in the brain ischemia domain, it is impossible to conclude which discretization is the most adequate f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 5 شماره
صفحات -
تاریخ انتشار 2004